It is a compound probability distribution, where a probability vector p is drawn from a Dirichlet distribution with parameter vector α, and an observation drawn from a multinomial distribution with probability vector p and number of trials N.
Specification
Pr(x∣α)=∫pPr(x∣p)Pr(p∣α)dp
which results in the following explicit formula:
Pr(x∣α)=(n!)Γ(α0)Γ(n+α0)∏k=1KΓ(xk+αk)(xk!)Γ(αk)=nB(α0,n)∏k:xk>0xkB(αk,xk)
where
α0 is defined as the sum
α0=∑αk The latter form emphasizes the fact that zero count categories can be ignored in the calculation.
- It reduces to the Categorical distribution as a special case when n = 1:
Pr(x∣α)=αkα0
where αk is seen as the unnormalized probability of each category. - It approximates the multinomial distribution arbitrarily well for large α.
Reference
Dirichlet-multinomial Distribution: https://en.wikipedia.org/wiki/Dirichlet-multinomial_distribution